Overview of MineSet [i]


The Decision Tree Inducer and Classifier

The Decision Tree Classifier classifies data according to a set of attributes by making a series of decisions based on those attributes. The process is similar to using a biological key to identify plants. Applying this classifier to determ ine the profile of someone with credit worthiness, for example, a decision tree might determine if someone who owns a home, owns a car that cost between $15,000 and $23,000, and has two children, is a good credit risk. The Decision Tree Inducer generates a decision tree classifier from a "training set" (a set of data that the user has already classified). Then, the structure o f the classifier's decision tree is displayed using the Tree Visualizer, with ea ch decision being represented by a node of the tree. The graphical representatio n can help the user understand the classification algorithm, as well as provide valuable insights into the data. Finally, the classifier can be used to classify unclassified data.

The Option Tree Inducer and Classifier

The Option Tree Classifier classifies data using a technique similar to the Deci sion Tree classifier. Unlike decision trees, option trees can contain special op tion nodes, which allow the classifier to consider the influence of splitting on multiple attributes simultaneously. For example, an option node in an option tr ee built to identify a car's the country of origin might choose miles per gallon , horse power, number of cylinders, and weight as informative attributes. In a d ecision tree, a node can choose at most one attribute for consideration at a tim e. In an option tree, the results of all options are "voted" when performing cla ssification. Option trees are often more accurate than decision trees; however, they generally are much larger.

The Option Tree Inducer generates an Option Tree classifier from a training set in much the same way that the Decision Tree inducer generates a Decision Tree. T he induced option tree is displayed using the Option Tree Visualizer. This visua lization helps you understand the classifier, and provides insight into which at tributes are important in determining the value of the label. In addition to vi sualizing the classifer, it can be used to classify unlabeled data.

The Evidence Inducer and Classifier

The Evidence Classifier classifies data by examining the probabilities of a spec ified result occurring based on a given attribute. For example, it might determi ne that someone who owns a car that cost between $15,000 and $23,000 has a 70% c hance of being a good credit risk, and a 30% chance of being a bad credit risk. The classifier predicts the class with the highest probability based on a simple probabilistic model.

The classifier is first generated from a training set, similar to the decision t ree classifier. The analysis of the data is displayed using the Evidence Visuali zer, which shows pie charts illustrating the different probabilities. This graph ical representation can help the user understand the classification algorithm, a s well as providing valuable insights into the data and answering "what if" ques tions. Finally, the classifier can be used to classify unclassified data.